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Abstract 

We  address  the  design  of  distributed  cognitive  medium  ac¬ 
cess  control  (MAC)  protocols  for  opportunistic  spectrum  access 
(OSA)  under  an  energy  constraint  on  the  secondary  users.  The 
objective  is  to  maximize  the  expected  number  of  information  bits 
that  can  be  delivered  by  a  secondary  user  during  its  battery  life¬ 
time  without  causing  interference  to  primary  users.  By  absorbing 
the  residual  energy  level  of  the  secondary  user  into  the  state 
space,  we  formulate  the  energy-constrained  OSA  problem  as 
an  unconstrained  partially  observable  Markov  decision  process 
( POMDP )  and  obtain  the  optimal  spectrum  sensing  and  access 
policy.  We  analyze  and  reduce  the  computational  complexity  of 
the  optimal  policy.  We  also  propose  a  suboptimal  solution  to 
energy-constrained  OSA,  whose  computational  complexity  can 
be  systematically  traded  off  with  its  performance.  Numerical 
examples  are  provided  to  study  the  impact  of  spectrum  occupancy 
dynamics,  channel  fading  statistics,  and  energy  consumption 
characteristics  of  the  secondary  user  on  the  optimal  sensing  and 
access  decisions. 

I.  Introduction 

The  exponential  growth  in  wireless  services  has  resulted  in  an 
overly  crowded  spectrum.  In  contrast  to  the  apparent  spectrum 
scarcity  is  the  pervasive  existence  of  spectrum  opportunities. 
Real  measurements  show  that,  at  any  given  time  and  location,  a 
large  portion  of  licensed  spectrum  lies  unused  [1],  Even  when 
a  frequency  band  is  actively  used,  the  bursty  arrivals  of  many 
applications  result  in  abundant  spectrum  opportunities  at  the 
millisecond  scale.  This  has  motivated  opportunistic  spectrum 
access  (OSA),  envisioned  by  the  DARPA  XG  program  [2],  The 
idea  of  OSA  is  to  allow  secondary  users  to  identify  and  exploit 
spectrum  opportunities  under  the  constraint  that  they  do  not  cause 
harmful  interference  to  primary  users. 

There  is  a  growing  body  of  literature  on  the  design  of  medium 
access  control  (MAC)  for  OSA  [3]-[8].  Most  existing  works 
(see  [3]-[6])  consider  a  network  of  geographically  distributed 
secondary  users,  each  affected  by  a  different  set  of  primary  users 
whose  spectrum  access  activities  are  static  or  slowly  varying 
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in  time.  The  design  objective  is  to  allocate  these  spatially 
varying  spectrum  opportunities  among  secondary  users  so  that 
the  network-level  spectrum  efficiency  is  maximized  subject  to 
some  regulatory  constraint  on  interference  to  primary  users. 

The  exploitation  of  temporal  spectrum  opportunities  resulting 
from  the  bursty  traffic  of  primary  users  has  been  studied  in 
[7],  [8].  Within  the  framework  of  partially  observable  Markov 
decision  process  (POMDP),  the  optimal  cognitive  MAC  protocol 
that  allows  secondary  users  to  independently  search  for  and 
exploit  instantaneous  spectrum  opportunities  has  been  developed 
in  [7],  This  MAC  protocol  consists  of  a  sensing  strategy  that 
determines  which  channels  in  the  spectrum  to  sense  based 
on  spectrum  occupancy  dynamics  and  an  access  strategy  that 
determines  whether  to  transmit  over  the  sensed  channels  based 
on  sensing  outcomes.  The  energy  constraint  of  secondary  users 
is,  however,  not  taken  into  account  in  [7],  [8]. 

The  incorporation  of  energy  constraint  can  significantly  com¬ 
plicate  the  cognitive  MAC  design.  Under  the  energy  constraint, 
sensing  decisions  should  be  made  based  on  not  only  the  spectrum 
occupancy  dynamics  but  also  channel  fading  statistics,  and  access 
decisions  should  take  into  account  not  only  the  availability  but 
also  the  fading  condition  of  the  sensed  channel.  This  makes 
the  optimal  sensing  and  access  strategies  opportunistic  in  both 
spectrum  and  time.  Even  the  residual  energy  level  of  the  sec¬ 
ondary  user  will  play  an  important  role  in  decision-making.  For 
example,  when  the  battery  is  depleting,  should  the  user  wait 
for  increasingly  better  channel  realizations  for  transmission  or 
should  it  lower  the  requirement  on  channel  given  that  sensing 
also  costs  energy?  Clearly,  such  decisions  depend  on  the  energy 
consumption  characteristics  of  secondary  users. 

As  a  starting  point  to  energy-constrained  OSA,  this  paper  aims 
to  develop  the  fundamental  limit  on  the  expected  number  of 
information  bits  that  can  be  delivered  by  a  secondary  user  during 
its  battery  lifetime.  By  absorbing  the  residual  energy  level  of  the 
secondary  user  into  the  state  space,  we  show  that  the  energy- 
constrained  OSA  problem  can  be  formulated  as  an  unconstrained 
POMDR  Based  on  the  theory  of  POMDP,  we  obtain  the  optimal 
sensing  and  access  policy  which  not  only  provides  a  performance 
benchmark  but  also  enables  us  to  study  the  impact  of  spectrum 
occupancy  dynamics,  channel  fading  statistics,  and  energy  con¬ 
sumption  characteristics  of  the  secondary  user  on  the  optimal 
sensing  and  access  decisions.  However,  our  complexity  analysis 
indicates  that  the  optimal  policy  is  computationally  expensive. 
We  therefore  exploit  the  underlying  structure  of  the  problem 
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to  reduce  the  computational  complexity  of  the  optimal  policy. 
We  also  provide  a  suboptimal  solution  whose  computational 
complexity  can  be  systematically  traded  off  with  its  performance. 
Referred  to  as  the  greedy-//;  strategy,  this  approach  maximizes 
the  throughput  of  the  secondary  user  in  a  fixed  time  window 
of  w  slots.  Simulation  result  shows  that  as  the  window  size 
w  increases,  the  performance  of  the  greedy-///  strategy  quickly 
approaches  the  optimal  performance. 

II.  Problem  Statement 

Consider  a  spectrum  consisting  of  N  slotted  channels,  each 
with  bandwidth  B.n  (n  =  1,  •  •  •  ,  N).  The  spectrum  is  licensed 
to  a  primary  network.  Let  Sn  €  {0  (occupied),  1  (idle)}  denote 
the  availability  of  channel  n  in  a  slot.  We  assume  that  the 
spectrum  occupancy  S  =  [Si, . . . ,  Sn]  G  {0,1}^  follows  a 
discrete  Markov  process  with  2N  states. 

We  consider  an  ad  hoc  secondary  network  where  there  is  no 
central  coordinator  or  dedicated  communication/control  channel. 
Secondary  users,  each  powered  by  a  battery  with  initial  energy 
Eq,  independently  seek  instantaneous  spectrum  opportunities  in 
these  N  channels.  At  the  beginning  of  each  slot,  a  secondary  user 
with  data  to  transmit  chooses  at  most  M  (1  <  M  <  N)  channels 
to  sense  and  then  decides  whether  to  access  these  channels 
according  to  the  sensing  outcomes.  Our  goal  is  to  determine 
the  sensing  and  access  decisions  sequentially  in  each  slot  so  as 
to  maximize  the  total  expected  number  of  information  bits  that 
can  be  delivered  by  a  secondary  user  during  its  battery  lifetime. 
For  ease  of  presentation,  we  assume  M  =  1.  Our  results  can  be 
generalized  to  M  >  1. 

A.  Protocol  Structure 

A  channel  only  presents  an  opportunity  to  a  pair  of  secondary 
users  if  it  is  available  at  both  the  transmitter  and  the  receiver. 
Hence,  spectrum  opportunities  need  to  be  identified  jointly  by 
the  transmitter  and  the  receiver  [9],  Next,  we  briefly  comment 
on  the  implementation  of  the  protocol. 

Suppose  that  the  transmitter  and  the  receiver  have  tuned  to 
the  same  channel  after  the  initial  handshake  as  described  in 
[9],  At  the  beginning  of  a  slot,  the  transmitter  and  the  receiver 
hop  to  same  channel1.  If  the  channel  is  sensed  to  be  available, 
the  transmitter  generates  a  random  backoff  time.  If  the  channel 
remains  idle  when  its  backoff  time  expires,  it  transmits  a  short 
request-to-send  (RTS)  message  to  the  receiver,  indicating  that 
the  channel  is  available  at  the  transmitter.  Upon  receiving  the 
RTS,  the  receiver  estimates  the  channel  fading  condition  using 
the  RTS,  and  then  replies  with  a  clear-to-send  (CTS)  message 
if  the  channel  is  also  available  at  the  receiver.  The  receiver 
also  informs  the  transmitter  of  the  current  fading  condition  by 
piggybacking  the  estimated  channel  state  to  the  CTS.  After  a 
successful  exchange  of  RTS-CTS,  the  transmitter  and  the  receiver 
can  communicate  over  this  channel.  At  the  end  of  this  slot,  the 
receiver  acknowledges  every  successful  data  transmission.  Note 
that  at  the  beginning  of  each  slot,  the  transmitter  and  the  receiver 

Note  that  the  protocols  developed  in  this  paper  can  ensure  the  transceiver 
synchronization  without  the  help  of  any  dedicated  communication  or  control 
channel.  See  details  in  III-C. 


can  also  choose  not  to  hop  to  any  channel  and  turn  to  sleep  mode 
until  the  beginning  of  next  slot. 

B.  Energy  Model 

We  assume  that  channels  between  the  secondary  user  and  its 
destination  follow  a  block  fading  model.  That  is,  the  channel  gain 
in  a  slot  is  a  random  variable  (RV)  identically  and  independently 
distributed  (i.i.d.)  across  slots  but  not  necessarily  i.i.d.  across 
channels. 

Let  Es(n)  and  L’tx  (n)  denote,  respectively,  the  energy  con¬ 
sumed  in  sensing  and  accessing  channel  n  in  a  slot.  For  sim¬ 
plicity,  we  assume  that  sensing  energy  consumption  Es(n)  is 
identical  for  all  channels:  Es  (n)  =  es  for  every  n.  Note  that  the 
transmission  energy  consumption  Etx{n)  is  a  RV  depending  on 
the  current  fading  condition  of  channel  n.  In  general,  the  better 
the  channel  condition,  the  lower  the  required  transmission  energy. 
Let  L  be  the  number  of  power  levels  at  which  the  secondary  user 
can  transmit  and  £k  the  energy  consumed  in  transmitting  at  the 
Ar-th  power  level  in  a  slot.  The  transmission  energy  consumption 
Etx(n)  thus  has  realizations  restricted  to  a  finite  set  £}x  given  by 

Stx(n)e£tx  =  {£fe}fc=o.  (1) 

where  0  <  eq  <  . . .  <  el  <  oo  and  £o  =  0  indicates  that  the 
secondary  user  does  not  transmit.  We  also  consider  the  energy 
ep  consumed  in  sleeping  mode  of  the  secondary  user. 

Let  E  denote  the  residual  energy  level  of  a  secondary  user  at 
the  beginning  of  a  slot.  Note  that  E  is  a  RV  determined  by  the 
channel  conditions  and  the  sensing  and  access  decisions  in  all 
previous  slots.  Thus,  E  belongs  to  finite  set  £,  given  by 

L 

E  G  £r  (e  .  e  —  £*o  ^  ^  Ck{cs  -f-  &k) 

k—0  ^ 

e  >  0,  c,  ck  >  0,  c,  cfe  €  Z}  U  {0}, 

where  Ck  is  the  number  of  slots  when  the  secondary  user  chooses 
to  sense  a  channel  and  then  transmit  over  it  at  the  ft-th  power 
level  and  c  is  the  number  of  slots  when  the  secondary  user  turns 
to  sleeping  mode. 

III.  Optimal  Energy-Constrained  OSA 

The  energy-constrained  OSA  can  be  formulated  a  constrained 
POMDP,  which  is  usually  more  difficult  to  solve  than  an  un¬ 
constrained  one.  By  absorbing  the  residual  energy  level  of  the 
secondary  user  into  the  state  space,  we  reduce  a  constrained 
POMDP  to  an  unconstrained  one.  Based  on  the  theory  of 
POMDP,  we  obtain  the  spectrum  optimal  sensing  and  access 
policy. 

A.  An  Unconstrained  POMDP  Formulation 

State  Space  In  each  slot,  the  network  state  is  characterized  by 
the  current  spectrum  occupancy  S  G  {0,  1 }  v  and  the  residual 
energy  level  E  G  £r  of  the  secondary  user  at  the  beginning  of 
this  slot.  The  state  space  S  can  be  defined  as 

(S,  E)  G  5  =  {(s,  e)  :  s  G  {0, 1}*,  e  G  £r}.  (3) 
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Action  Space  After  the  state  transition  of  spectrum  occupancy  at 
the  beginning  of  each  slot,  the  secondary  user  can  either  choose 
a  channel  a  £  {l,...,iV}  to  sense  or  turn  to  sleep  (a  =  0). 
If  the  secondary  user  chooses  channel  a  to  sense,  then  it  will 
obtain  a  sensing  outcome  Oa  £  {0,1,...,  L}  which  reflects  the 
occupancy  state  and  the  fading  condition  of  the  chosen  channel: 
0a  =  0  indicates  that  channel  a  is  busy  (i.e.,  Sa  =  0)  and  0a  = 
k  (k  =  1, . . . ,  L)  indicates  that  channel  a  is  idle  (i.e.,  Sa  =  1)  and 
the  fading  condition  requires  the  secondary  user  to  transmit  at  the 
fc-th  power  level  (i.e.,  Etx(a)  =  £*,)■  Given  sensing  outcome  0a, 
the  secondary  user  decides  whether  to  transmit  over  the  chosen 
channel.  Let  <1 >a(fc)  £  {0  no  access,  1  access}  (k  =  0,...,L) 
denote  the  access  decision  under  sensing  outcome  0a  =  Since 
we  have  assumed  perfect  spectrum  sensing,  the  access  decision 
under  0a  =  0  (busy)  is  simple:  3>a(0)  =  0  (no  access).  In  this 
case,  secondary  users  will  not  collide  with  primary  users. 

The  action  space  A  consists  of  all  sensing  decisions  a  and 
access  decisions  =  [<Fa(l), . . . ,  <J>a(L)]: 

(a,  d>a)  £  *4=  {(0,  [0, . . . ,  0])}  U  {(a,  <p)  :  a  €  {1, ... ,  N}, 
0=[<K1),..., </,(£)]  e{0,i}L}.  (4) 


Note  that  the  access  decision  3>o  associated  with  sensing  action 
a  =  0  (sleeping  mode)  is  determined  by  <f>o(fc)  =  0  for  all 
1  <  k  <  L. 


Network  State  Transition  Recall  that  the  network  state  consists 
of  two  parts:  the  spectrum  occupancy  S  and  the  residual  energy  E 
of  the  secondary  user.  At  the  beginning  of  each  slot,  the  spectrum 
occupancy  S  transits  independently  of  the  residual  energy  E 
according  to  transition  probabilities  {ps,s'}.  where  ps  s/  denotes 
the  probability  that  the  spectrum  occupancy  state  transits  from 
s  £  {0,1}^  to  s'  £  {0,1}^.  In  this  paper,  we  assume  that 
the  spectrum  occupancy  dynamics  |pSjS' }  are  known  and  remain 
unchanged  during  the  battery  lifetime  of  the  secondary  user. 

If  the  secondary  user  decides  to  choose  channel  a  £ 
{l,...,iV}  to  sense  in  this  slot,  then  it  will  consume  es  in 
sensing  and  <f>a(0o)£ea  in  transmitting.  Thus,  at  the  end  of 
this  slot,  the  residual  energy  of  the  secondary  user  reduces  to 
E'  =  Te(E  I  a,0o,$o(0o)): 

Te(E  |  a,0o,$a(0a)) 

(E  -  ep,  a  =  0,  (5) 

jmax{£,-es  -  $a(0a)£eo,O},  a  ±  0, 

where  ep  is  energy  consumed  in  the  sleeping  mode. 
Observations  Due  to  partial  spectrum  sensing,  the  secondary 
user  does  not  have  full  knowledge  of  the  spectrum  occupancy 
state  in  each  slot.  It,  however,  can  obtain  the  occupancy  state  of 
the  chosen  channel  a  £  (1, . . . ,  N}  from  sensing  outcome  (i.e., 
observation)  0a  £  {0, 1, ... ,  L}.  Let  qia\k)  be  the  probability 
that  the  secondary  user  observes  0O  =  k  in  the  chosen  channel 
a  given  current  spectrum  occupancy  state  S  =  s.  Under  perfect 
spectrum  sensing,  we  have  that 


g(°>(fc)=Pr{0o  =  fc|S  =  s} 


_  J  l[fc^O]Pa(^0? 


L[fc=0]> 


if  u  f  0,  sa  —  1, 
if  a  0,  sa  =  0, 


(6) 


where  pa{k)  =  Pr{£tx(a)  —  £&}  is  the  probability  that  the 
fading  condition  of  channel  n  requires  the  secondary  user  to 
transmit  at  the  fc-th  power  level,  and  l[xj  is  the  indicator  function: 
l[x]  =  1  if  £  is  true  and  0  otherwise.  Note  that  {pa(fc)}£_ x 
are  determined  by  the  fading  statistics  of  channel  a  and  are 
independent  of  the  spectrum  occupancy  state.  From  (6),  we  can 
see  that  Xk-0  9s^(k)  =  1  for  any  spectrum  occupancy  state 
s  £  S  and  any  chosen  channel  a  £  (1, . . . ,  N}. 

Note  that  if  the  secondary  user  turns  to  sleep,  then  it  will  not 
have  any  sensing  outcome.  We  can  define  as  arbitrary 

values  that  satisfy  'X^=0qs°\k)  =  1.  For  simplicity,  we  define 
q{s°\k)  =  l[fc=o]. 

Reward  Structure  At  the  end  of  each  slot,  the  secondary 
user  obtains  a  non-negative  reward  depending  on 

its  residual  energy  E  at  the  beginning  of  this  slot,  the  sensing 
outcome  0O,  and  the  sensing  and  access  decisions  (a,$o(0„)). 
Assuming  that  the  number  of  information  bits  that  can  be 
transmitted  over  a  channel  in  one  slot  is  proportional  to  the 
channel  bandwidth,  we  define  immediate  reward  e“^e°^  as 


R(a,‘ MGa))  *  J  °>  a  =  °’  (7) 

\$a(0a)-Bal[£-e»-eea>O]>  a  f  0. 

That  is,  a  reward  is  obtained  if  and  only  if  the  secondary 
chooses  to  sense  and  access  (i.e.,  a  ^  0,  cE>o(0a)  =  1)  an 
idle  channel  (i.e.,  0O  /  0)  and  its  residual  energy  is  enough 
to  cope  with  the  channel  fade  in  the  selected  channel  (i.e., 
E  —  es  —  £©a  >  0).  Note  that  no  reward  will  be  accumulated 
once  the  battery  energy  level  drops  below  es  +  e\,  where  e\  is 
the  least  required  transmission  energy.  Hence,  the  total  expected 
accumulated  reward  represents  the  total  expected  number  of 
information  bits  that  can  be  delivered  by  the  secondary  user 
during  its  battery  lifetime. 

Belief  State  At  the  beginning  of  a  slot,  the  secondary  user 
has  the  information  of  its  own  residual  energy  E  but  not  the 
current  spectrum  occupancy  state  S.  Its  knowledge  of  S  based 
on  all  past  decisions  and  observations  can  be  summarized  by  a 
belief  state  A  =  {As}sg{0,i}iv  [10],  where  As  is  the  conditional 
probability  (given  the  decision  and  observation  history)  that  the 
network  state  is  S  =  s  at  the  beginning  of  this  slot  prior  to  the 
transition  in  the  spectrum  occupancy  state. 

At  the  end  of  a  slot,  the  secondary  user  can  update  the  belief 
state  A  for  future  use  based  on  sensing  action  a  and  sensing 
outcome  0a  in  this  slot.  Specifically,  let  A'  =  T\(\  \  a,  k )  denote 
the  updated  belief  state  whose  element  Ag  denotes  the  probability 
that  the  current  spectrum  occupancy  state  is  S  =  s  given  belief 
state  A  at  the  beginning  of  this  slot  and  the  observation  0a  =  k 
of  chosen  channel  a  in  the  current  slot.  Applying  Bayes  rule,  we 
obtain  Ag  as 


Ag  =  Pr{S  =  s  |  A,  a,  k} 

X/s'  ^s'Ps',S) 

AS'pS'iSiu  _  ] 


[fc^O]  J 


Ss"  Ss'  y,Ps',s"^[s'(l'  =  1[fc7£0]] 


a  =  0, 

«  /  o, 


(8) 


where  the  summations  are  taken  over  the  space  {0, 1}A  of 
spectrum  occupancy  state  S.  Note  that  when  the  secondary  user 
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turns  to  sleeping  mode  (a  =  0),  no  observation  is  made  and 
the  belief  state  is  updated  according  to  the  spectrum  occupancy 
dynamics  {pSjS/}. 

Unconstrained  POMDP  Formulation  We  have  formulated  the 
energy-constrained  OSA  as  a  POMDP  problem.  A  policy  7 r  of 
this  POMDP  is  defined  as  a  sequence  of  functions: 

7T=  ■  •  ■])  Ft  '■  [0,  l]2  X  £r  ->  A, 


where  {a,  <&a}  =  Pt{\  E)  maps  every  information  state  (A,  E), 
which  consists  of  belief  state  A  £  [0,  l]2  and  residual  energy 
E  £  £r,  at  the  beginning  of  slot  t  to  a  sensing  decision 
a  £  {0,1,..,, N}  and  a  set  of  access  decisions  = 
[$a(i),...,$a(L)]e{o,i}L. 

The  design  objective  is  to  find  the  optimal  policy  7r*  that 
maximizes  the  total  expected  reward: 


7T*  =  argmaxE-n 


V"  n(a,*a(6a)) 
/  ,  n(S,E),ea 
i= 1 


(t) 


(9) 


where  Ao  is  the  initial  belief  state  given  by  the  stationary  dis¬ 
tribution  of  spectrum  occupancy.  We  thus  have  an  unconstrained 


POMDP. 


beginning  of  each  slot  in  order  to  carry  out  the  communication 
[9].  Here  we  show  that  the  optimal  sensing  and  access  policy 
developed  in  Section  III-B  ensures  transceiver  synchronization. 

The  protocol  structure  described  in  Section  II-A  ensures  that 
both  the  transmitter  and  the  receiver  have  the  same  information 
on  the  occupancy  state  and  the  fading  condition  of  the  sensed 
channel  in  each  slot.  Hence,  at  the  end  of  each  slot,  the  transmit¬ 
ter  and  the  receiver  will  reach  the  same  updated  belief  state  A 
using  (8)  and  the  same  residual  energy  E  of  the  transmitter  using 
(5).  Since  the  channel  selection  is  determined  by  the  information 
state  (A,  E),  the  transmitter  and  the  receiver  will  hop  to  the 
same  channel  in  the  next  slot,  i.e.,  transceiver  synchronization  is 
maintained. 

IV.  Optimal  Policy  with  Reduced  Complexity 

Although  the  value  function  given  in  (10)  can  be  solved 
iteratively,  it  is  computationally  expensive.  In  this  section,  we 
first  identify  the  sources  of  high  complexity  of  the  optimal  policy 
and  then  reduce  the  complexity  accordingly. 


B.  Optimal  Policy 


Let  V(X,E)  be  the  value  function,  which  denotes  the  maxi¬ 
mum  expected  remaining  reward  that  can  be  accrued  when  the 
current  information  state  is  (A ,E).  We  notice  from  (7)  that  the 
value  function  is  given  by  V (A,  E)  =  0  for  any  information 
state  (A ,E)  with  residual  energy  E  <  es  +  e  1.  For  any  other 
information  state,  its  value  function  V (A,  E)  is  the  unique 
solution  to  the  following  equation: 


V(\E)=  max 

(a,(f))GAk=0 

+  V(TX( A  |  a,  k),  Te(E  \  a ,  k,  0(fe)))], 


(10) 


where  T\(A|a,  k)  is  the  updated  belief  state  given  in  (8), 
TE(E\a,k,<j>(k))  is  the  reduced  battery  energy  given  in  (5), 
and  u[a)  =  Pr{0o  =  k  |  A}  is  the  probability  of  observing 
0O  =  k  given  belief  state  A,  which  is  determined  by  the  spectrum 
occupancy  dynamics  and  the  channel  fading  statistics: 

uka)=  As'  ps,’s  ®so)(*0-  (ii) 

s'efo,!}"  S6{0,l}w 


In  principle,  by  solving  (10),  we  can  obtain  the  optimal  sensing 
and  access  actions  (a*,  3>*)  that  achieve  the  maximum  expected 
reward  V(X,E)  for  each  possible  information  state  (A ,E).  We 
can  also  obtain  the  maximum  expected  number  of  information 
bits  Vopt  that  can  be  delivered  by  a  secondary  user  during  its 
battery  lifetime  as  Vopt  =  V(Ao,£o)>  where  Aq  is  the  initial 
belief  state. 


C.  Transceiver  Synchronization 

Without  a  dedicated  communication  or  control  channel,  trans¬ 
ceiver  synchronization  is  a  key  issue  in  distributed  MAC  design 
for  OSA  networks  [9],  Specifically,  a  secondary  user  and  its 
intended  receiver  need  to  hop  to  the  same  channel  at  the 


A.  Complexity  of  the  Optimal  Policy 

We  measure  the  computational  complexity  of  a  policy  as  the 
number  of  multiplications  required  to  obtain  all  sensing  and 
access  actions  during  the  secondary  user’s  battery  lifetime  T 
when  initial  belief  state  and  battery  energy  are  given. 

From  (10),  we  notice  that  the  optimal  sensing  and  access 
action  in  the  first  slot  depends  on  the  value  functions  of  all 
possible  information  states  during  the  battery  lifetime  T.  Hence, 
the  computational  complexity  of  the  optimal  policy  is  determined 
by  the  number  of  multiplications  required  to  calculate  the  value 
functions  of  all  possible  information  states. 

Following  the  complexity  analysis  in  [11],  we  can  calculate 
the  number  of  all  possible  information  states  (A,  E)  during  the 
secondary  user’  battery  lifetime.  Specifically,  noting  from  (8) 
that  the  updated  belief  state  is  the  same  under  all  non-zero 
sensing  outcomes  ( k  0),  we  can  see  that  each  information 
state  (A,  E)  can  transit  to  at  most  L  +  1  different  information 
states  under  sensing  action  a  ^  0  but  only  one  under  sensing 
action  a  =  0.  Hence,  for  fixed  initial  information  state  (Ao,£o)> 
the  number  of  all  possible  information  states  is  on  the  order  of 
0((N(L  +  l))T1)i  which  is  exponential  in  the  battery  lifetime 
T  and  polynomial  in  the  number  N  of  channels.  Moreover,  from 
(10)  and  (11),  we  can  see  that  it  requires  0(3\A\2N  2N  {L  +  1)) 
multiplications  to  calculate  each  value  function,  where  |_4]  is  the 
size  of  the  action  space,  2N  is  the  dimension  of  the  belief  state, 
and  L  +  1  is  the  number  of  possible  observations.  Therefore,  the 
computational  complexity  of  the  optimal  policy  is  on  the  order 
of  0(3\A\2N2N  (L  +  1  )(N(L  +  1))T_1).  We  can  see  that  the 
complexity  is  mainly  caused  by  the  following  three  factors:  1) 
the  number  0((N(L  +  1))T_1)  of  possible  information  states; 
2)  the  size  \A\  of  the  action  space,  and  3)  the  dimension  2N  of 
the  belief  state.  We  will  address  the  first  factor  in  Section  V.  In 
this  section,  we  focus  on  the  other  two  factors. 
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B.  Reduction  of  Action  Space  Size 

Careful  inspection  of  (5),  (7)  and  (10)  reveals  that  the  quantity 
R%fk))  +  V(T\(X\a,k),TE(E\a,k,(j)(k)))  inside  the  square 
parenthesis  of  (10)  only  depends  on  the  fc-th  entry  <b(k)  of  the 
access  decision  <f>  and  is  independent  of  ( i  ^  k).  We  can 

thus  simplify  (10)  as 


L 

V  (A,  E)  =  max  max  [R 

a£{0,l,...,iV}1^  0(fe)e{O,l}L 


(a,0(fc)) 

•E,k 


(12) 


+  V(TX( A  |  a,  k),  Te{E  \  a ,  k,  0(fc)))]}. 


Note  that  the  maximization  in  (12)  is  taken  over  the  space  with 
size  0(2NL)  increasing  linearly  with  the  number  L  of  power 
levels,  while  that  in  (10)  is  taken  over  the  action  space  A  whose 
size  0(N2l)  increases  exponentially  with  L. 

In  Proposition  1,  we  show  that  the  optimal  access  decision  d>* 
for  sensed  channel  a  is  of  threshold  type. 

Proposition  1:  Given  the  belief  state  A  and  the  residual 
energy  level  E  of  the  secondary  user  at  the  beginning  of  a 
slot,  there  exists  a  threshold  k*  associated  with  sensing  action 
a  E  {1,...,./V}  such  that  the  optimal  access  decision  = 
[0o(1)>---,0a(1')]  is  given  by 


1,  if  k  <  k*, 
0,  if  k  >  k*, 


(13) 


Proof:  Assume  4>*a{kf)  =  1  for  some  1  <  k*  <  L. 
For  any  1  <  k  <  k*,  we  have  <  £&*.  From  (5),  we 
have  Te(E  \a,k,l)  >  TE(E\a,k*,l)  and  TE{E\a,k,0)  = 
Te(E  |  a,  k*,0).  From  (8),  we  have  T\( A  |  a,k)  =  T\(X  \  a,  k*). 
Combining  the  above  observations  and  noting  that  the  to¬ 
tal  expected  reward  V (A,  E)  increases  with  E  for  any  fixed 
A,  we  can  show  that  Ba  +  V{T\(X  \  a,  k),  Te(E  \  a,  k,  1))  > 
V(T\(X\a,k),TE(E\a,k,0)).  Therefore  4>*a{k)  =  1  for  any 
1  <  k  <  k*.  The  existence  of  k*  follows  from  the  fact  that 


there  are  a  finite  number  of  observations. 


Proposition  1  can  help  us  avoid  the  search  for  optimal  access 
decisions  in  some  scenarios,  resulting  in  further  complexity 
reduction.  Specifically,  for  each  sensing  action  a  /  0,  we  can 
calculate  the  optimal  access  decisions  f*a(k)  in  a  decreasing  order 
of  sensing  outcome  k.  Once  we  have  < f>*(k *)  =  1  for  a  certain 
value  of  k*,  we  can  determine  the  optimal  access  decisions  for  all 
remaining  sensing  outcomes  k  <  k*  without  further  computation. 


C.  Reduction  of  Belief  State  Dimension 

Assume  that  the  spectrum  occupancy  evolves  independently 
across  channels.  It  has  been  shown  in  [7]  that  ut  =  [ui, . . . ,  con], 
where  un  denotes  the  probability  (conditioned  on  all  previous 
decisions  and  observations)  that  channel  n  is  available  at  the 
beginning  of  a  slot  prior  to  the  state  transition,  is  a  sufficient 
statistic  to  belief  state  A.  Note  that  the  dimension  of  u>  increases 
linearly  O(N)  with  the  number  N  of  channels  while  that  of  A 
increases  exponentially  0(2N). 

Applying  the  belief  state  u),  we  can  simplify  the  value  function 
given  in  (12).  Specifically,  let  an  =  Pr{S^  =  1 1  Sn  =  0}  denote 
the  probability  that  channel  n  transits  from  0  (busy)  to  1  (idle) 


and  f3n  =  Pr{<S^  =  1 1  Sn  =  1}  the  probability  that  channel  n 
remains  idle.  Then,  (12)  reduces  to 

V(u>,E)=  max  {(1-aO 

a£{0,l,...,JV} 

x  V(Tx(v  |  a,  0),  Te(E  \  a,  0, 0)) 

(14) 

Pa{k)  max 

+  V(tx{u>  |  a,  k),  Te(E  |  a ,  k,  </>(fc)))]}, 

where  uj'0  =  0,  u'a  =  uaf3a  +  (1  -  wa)a0  (a  G  {1, . . . ,  L})  is  the 
probability  that  channel  a  is  available  in  the  current  slot  given 
u>,  Te(E\  a,  k,  <fia(k))  is  the  reduced  battery  energy  given  in  (5), 
and  the  updated  belief  state  u>  =  [tui, . . . ,  wjv]  =  \  a,  k)  is 

given  by 

!0,  if  a  7^  0,  n  =  a,  k  =  0, 

1,  if  a  0,  n  =  a,  k  yk  0,  (15) 

uj'n,  otherwise. 

V.  Suboptimal  Energy-Constrained  OSA 

From  (10),  we  notice  that  the  optimal  sensing  and  access 
decisions  in  a  slot  rely  on  the  value  functions  of  all  possible 
information  states  in  the  remaining  slots,  which  significantly 
increases  the  computational  complexity  of  the  optimal  policy. 
In  this  section,  we  provide  a  suboptimal  solution  to  energy- 
constrained  OSA,  which  reduces  the  number  of  value  functions 
used  in  decision-making.  We  show  that  the  computational  com¬ 
plexity  of  this  suboptimal  strategy  can  be  traded  off  with  its 
performance. 


A.  The  Greedy-w  Approach 


Referred  to  as  greed v-tc  approach,  the  proposed  strategy  max¬ 
imizes  the  total  expected  reward  in  a  time  window  of  w  slots.  Let 
ria)  (A,  E)  denote  the  maximum  reward  that  can  be  accumulated 
in  a  window  of  w  slots  given  information  state  (A,  E)  and 
sensing  action  a.  We  can  calculate  (A,  E)  recursively  by 


Po(a)(A,£)  =0 

4a)(A,  E)=J2  uia)  max  [R<gf t)) 

“  0(fe)e{o,i} 

+  max  Y£‘]_1(Tx(\\a,k),TE(E\a,k,(f>(k)))\, 
be{o,i 


(16) 


where  u^\  T\( A  |  a,  k),  and  Te(E  \  a,  k,  4>{k))  are  given  in  (11), 
(8),  and  (5),  respectively.  From  (16),  we  can  see  that  for  any  w, 
yia)(A,£)  =0if  E  <es  +  £i. 

Given  belief  state  A  and  residual  energy  E  of  the  secondary 
user  at  the  beginning  of  a  slot,  the  grecdy-u'  approach  chooses 
channel  aw  that  maximizes  the  reward  obtained  in  the  next  w 
slots  to  sense,  i.e.. 


aw  =  arg  max  Y.j,a\X,E).  (17) 

a£{0,l,...,JV} 


5  of  7 


Given  sensing  outcome  k  £  {1  the  access  decision 

4>aw  ( k )  of  the  greedy-w  approach  is  given  by 


<t>aw{k)  =  ai'g  max 

0G{O,1} 

+  max  l{Tx(X\aw,k),TE{E 

6e{i,...,tv} 


(18) 

aw,k,(f>))]}. 


Since  its  channel  selection  is  determined  by  the  information  state 
(A,  E ),  the  greedy-w  approach  ensures  transceiver  synchroniza¬ 
tion  as  shown  in  Section  III-C. 


Next,  we  consider  two  extreme  cases  of  the  greedy-w  strategy. 


Case  1:  When  w  =  1,  the  greedy-1  approach  focuses  solely  on 
maximizing  the  immediate  reward.  Specifically,  the  secondary 
user  employing  greedy-1  approach  chooses  the  channel  with  the 
maximum  expected  immediate  reward  and  transmits  whenever 
the  channel  is  sensed  to  be  available: 

L 


a\  =  arg  max 

oe{i,...,iV} 


E„(“)  d(«^*i(‘)) 
u k  E,k 


(19) 


0ai  (k)  —  1  [kjtO]  • 


The  greedy- 1  approach  has  the  lowest  computational  complexity 
but  worst  performance  as  illustrated  in  Fig.  1. 

Case  2:  Consider  the  case  when  window  size  w  exceeds  the 
maximum  battery  lifetime  of  the  secondary  user.  In  this  case, 
the  network  reaches  a  terminating  state  in  less  than  w  slots 
regardless  of  the  sensing  and  access  strategies.  Since  no  reward 
is  accumulated  after  the  network  reaches  a  terminating  state,  the 
greedy-w  approach  is  equivalent  to  the  optimal  strategy. 


B.  Complexity  Vs.  Performance 


Fig.  1 .  The  number  of  information  bits  that  can  be  transmitted  by  the  secondary 
user  during  its  battery  lifetime.  N  =  2,  [B\,  B2]  =  [1, 1],  [«i,  02}  =  [0.2,  0.6], 
[/3i,/32]  =  [0.8,  0.8],  es  =  0.5,  ev  =  0.1,  L  =  2,  £tx  =  {1,2},  pn(  1)  =  0.8, 
Pn  (2)  =  0.2  for  n  =  1,2. 


We  can  see  from  (17)  and  (18)  that  the  sensing  and  access 
decisions  made  by  the  greedy-u;  approach  in  a  slot  only  depend 
on  the  value  functions  of  all  possible  information  states  in  the 
next  w  slots.  Hence,  the  total  number  of  value  functions  required 
to  determine  the  sensing  and  access  decisions  during  battery 
lifetime  T  is  on  the  order  of  0((N(L  +  1  ))W~1T),  which  is 
linear  in  T.  Clearly,  the  computational  complexity  of  greedy-w 
approach  increases  with  w. 

Next,  we  compare  the  performance  of  the  greedy-w  approach 
with  the  optimal  performance  V(Aq,£o)-  In  Fig-  1,  we  plot  the 


total  expected  number  of  information  bits  that  can  be  delivered 
by  the  secondary  user  during  its  battery  lifetime  as  a  function  of 
the  initial  energy  £q.  We  consider  N  =  2  independently  evolving 
channels  with  different  occupancy  dynamics.  As  the  window 
size  w  increases,  the  performance  of  the  greedy-w  approach 
improves.  It  quickly  approaches  the  optimal  performance  as  w 
increases. 

The  above  observations  show  that  the  computational  complex¬ 
ity  of  the  greedy-w  approach  increases  while  its  performance  loss 
as  compared  to  the  optimal  performance  decreases  as  the  window 
size  w  increases.  Hence,  by  choosing  a  suitable  w,  the  greedy-w 
approach  can  achieve  a  desired  tradeoff  between  complexity  and 
performance. 


VI.  Numerical  Examples 

Careful  inspection  of  (10)  reveals  that  a  sensing  and  access 
action  (a,  <f>)  £  A  affects  the  total  expected  reward  in  three 
ways:  1)  it  acquires  an  immediate  reward  in  this 

slot;  2)  it  transforms  the  current  belief  state  A  to  T\(X,a,k) 
which  summarizes  the  information  of  spectrum  occupancy  up 
to  this  slot;  3)  it  causes  a  reduction  in  battery  energy  from 
E  to  TE(E,a,k,(j)(k)),  leading  to  a  shorter  remaining  battery 
lifetime.  Hence,  to  maximize  the  total  expected  reward  during 
battery  lifetime,  the  optimal  sensing  and  access  policy  should 
achieve  a  tradeoff  among  gaining  instantaneous  reward,  gaining 
information  for  future  use,  and  conserving  energy.  In  this  section, 
we  study  the  impact  of  spectrum  occupancy  dynamics,  channel 
fading  statistics,  and  energy  consumption  characteristics  on  the 
optimal  sensing  and  access  actions. 

To  sense  or  not  to  sense?  The  secondary  user  may  choose  to 
sense  in  order  to  gain  immediate  reward  and  channel  occupancy 
information,  but  not  to  sense  in  order  to  conserve  energy.  Hence, 
the  optimal  decision  on  whether  to  sense  should  strike  a  balance 
between  gaining  reward/information  and  conserving  energy.  In 
Fig.  2,  we  study  the  optimal  sensing  decision  l[a*^o]  in  a 
particular  slot  under  different  spectrum  occupancy  dynamics  and 
belief  states. 


■  to,  =  [0.5, 0.5] 
□  f>2  =  [0.0] 


Primary  Occupancy  Dynamics  a 


Fig.  2.  The  optimal  decision  lra»-^Qi  on  whether  to  sense  under  different 
spectrum  occupancy  dynamics  and  belief  states.  N  =  2,  [Bi,i?2]  =  [1,1], 
£0  =  4,  e3  =  0.6,  ep  =  0.1,  L  =  2,  £tx  =  {1,  2},  pn(  1)  =  pn{ 2)  =  0.5  for 
n  =  1,2. 

We  consider  N  =  2  independently  evolving  channels  with 
identical  spectrum  occupancy  dynamics  a\  =  «2  =  ct  and 
/?i  =  /?2  =  /?.  We  assume  that  (3  =  1  —  a.  Hence,  the 
stationary  distribution  of  spectrum  occupancy  state  S  is  given  by 
u>-[  =  [0.5,  0.5],  Consider  another  belief  state  u>2  =  [0,0]  with 
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which  the  secondary  user  has  full  information  on  the  spectrum 
occupancy  prior  to  the  state  transition  in  this  slot.  Conditioned 
on  the  belief  states  at  the  beginning  of  this  slot,  the  conditional 
probability  that  channel  n  is  available  can  be  calculated  as 
P r { S fi  =  1 1  u>i}  =  0.5  and  Pr{iSn  =  1 1  u>2 }  =  a  for  n  =  1,  2. 
From  Fig.  2,  we  find  that  the  secondary  user  chooses  not  to 
sense  only  when  the  conditional  probability  Pr{S„  =  1 1  u>} 
that  the  channel  is  available  is  very  small.  We  also  find  that 
the  secondary  user  always  chooses  to  sense  if  the  belief  state  is 
given  by  the  stationary  distribution  uti  of  the  spectrum  occupancy 
dynamics.  The  reason  behind  this  is  the  monotonicity  of  the  value 
function  V(u>,E)  in  terms  of  battery  energy  E.  Specifically,  if 
the  secondary  user  chooses  not  to  sense,  then  its  belief  state 
at  the  beginning  of  next  slot  will  remain  uti  but  its  battery 
energy  will  be  reduced  by  ep  due  to  energy  consumption  in  the 
sleeping  mode.  The  maximum  total  expected  reward  that  can 
be  obtained  is  thus  given  by  V(u>i ,E  —  ep).  Since  V{u >,E) 
increases  with  the  battery  energy  E  for  every  fixed  u>,  we  have 
V (<*>1  ,E)  >  V (u i,  E—ep)  and  hence  the  secondary  user  should 
choose  to  sense  whenever  it  has  a  stationary  belief  state. 


Fig.  3.  The  optimal  access  decision  under  different  sensing  energy  consumptions 
es  and  channel  fading  statistics.  N  =  2,  [B i,  B2]  =  [1, 1],  £ 0  =  8,  ep  =  0.1, 
L  =  3,  £tx  =  {1,  2,3}.  In  the  upper  plot,  pn{  1)  =  0.5, pn (2)  =  0.3, pn (3)  = 
0.2  for  n  =  1,  2,  3.  In  the  lower  plot,  pn(  1)  =  0.3, pn (2)  =  0.3,  pn (3)  =  0.4. 

To  access  or  not  to  access?  Without  an  energy  constraint, 
the  secondary  user  should  always  access  the  channel  that  is 
sensed  to  be  available.  However,  under  the  energy  constraint, 
the  access  decision  should  take  into  account  both  the  energy 
consumption  characteristics  and  the  channel  fading  statistics.  For 
example,  when  the  sensed  channel  is  available  but  has  poor 
fading  condition,  should  the  secondary  user  access  this  channel 
to  gain  immediate  reward  or  wait  for  better  channel  realizations 
to  conserve  energy?  In  Fig.  3,  we  study  the  optimal  access 
decision  <j>*  under  different  sensing  energy  consumptions  es  and 
channel  fading  statistics  {pn(k)}%=1.  We  find  that  when  sensing 
energy  consumption  es  is  negligible,  the  secondary  user  should 
refrain  from  transmission  under  poor  channel  conditions  and  wait 
for  the  best  channel  realization.  However,  when  es  is  large,  it 
should  always  grab  the  instantaneous  opportunity  regardless  of 
the  fading  condition  because  the  sensing  energy  consumed  in 
waiting  for  the  best  channel  realization  may  exceed  the  extra 
energy  consumed  in  combating  the  poor  channel  fading. 

The  access  decision  should  also  take  into  account  the  channel 
fading  statistics.  Comparing  the  optimal  access  decisions  in  the 
upper  and  the  lower  plots  of  Fig.  3  when  sensing  energy  is 
es  =  0.8.  We  find  that  if  the  probability  that  the  channel  expe¬ 


riences  deep  fading  is  small  (see  the  upper  plot),  the  secondary 
user  should  avoid  transmitting  under  poor  channel  realizations 
because  the  waiting  time  for  a  better  channel  realization  is  short 
and  hence  the  energy  wasted  in  waiting  can  still  be  lower  than 
the  extra  energy  needed  to  combat  the  poor  channel  condition. 
On  the  other  hand,  if  the  channel  tends  to  have  poor  fading 
conditions  (see  the  lower  plot),  the  secondary  user  should  focus 
on  gaining  immediate  reward  because  of  the  long  waiting  time 
for  better  channel  realizations. 

VII.  Conclusion 

In  this  paper2,  we  obtained  the  optimal  sensing  and  access 
policy  for  energy-constrained  OSA  by  formulating  the  resulting 
problem  as  an  unconstrained  POMDP.  We  proposed  a  suboptimal 
solution,  called  greedy-w,  whose  computational  complexity  can 
be  systematically  traded  off  with  its  performance.  Numerical 
results  demonstrated  that  the  optimal  sensing  and  access  deci¬ 
sions  should  take  into  account  not  only  the  spectrum  occupancy 
dynamics  but  also  the  channel  fading  statistics  and  the  energy 
consumption  characteristics  of  the  secondary  user. 

Throughout  this  paper,  we  have  assumed  perfect  spectrum 
sensing,  i.e.,  the  sensing  outcome  reflects  the  true  channel  state. 
Our  future  work  on  energy-constrained  OSA  will  address  the 
design  of  spectrum  sensing  and  access  policy  in  the  presence  of 
spectrum  sensing  errors. 
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